Search CORE

101 research outputs found

Gender bias and natural language processing

Author: Costa-Jussà Marta R.
Publication venue: Barcelona Supercomputing Center
Publication date: 01/01/2020
Field of study

Demographic biases are widely affecting artificial intelligence. In particular, gender bias is clearly spread in natural language processing applications, e.g. from stereotyped translations to poorer speech recognition for women than for men. In this talk, I am going to overview the research and challenges that are currently emerging towards fairer natural language processing in terms of gender

UPCommons. Portal del coneixement obert de la UPC

Enhancing Word Embeddings with Knowledge Extracted from Lexical Resources

Author: Biesialska Magdalena
Costa-jussà Marta R.
Rafieian Bardia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

In this work, we present an effective method for semantic specialization of word vector representations. To this end, we use traditional word embeddings and apply specialization methods to better capture semantic relations between words. In our approach, we leverage external knowledge from rich lexical resources such as BabelNet. We also show that our proposed post-specialization method based on an adversarial neural network with the Wasserstein distance allows to gain improvements over state-of-the-art methods on two tasks: word similarity and dialog state tracking.Comment: Accepted to ACL 2020 SR

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

Evaluating the Underlying Gender Bias in Contextualized Word Embeddings

Author: Basta Christine
Casas Noe
Costa-jussà Marta R.
Publication venue
Publication date: 01/01/2019
Field of study

Gender bias is highly impacting natural language processing applications. Word embeddings have clearly been proven both to keep and amplify gender biases that are present in current data sources. Recently, contextualized word embeddings have enhanced previous word embedding techniques by computing word vector representations dependent on the sentence they appear in. In this paper, we study the impact of this conceptual change in the word embedding computation in relation with gender bias. Our analysis includes different measures previously applied in the literature to standard word embeddings. Our findings suggest that contextualized word embeddings are less biased than standard ones even when the latter are debiased

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

Chinese–Spanish neural machine translation enhanced with character and word bitmap fonts

Author: A Lavie
D Chiang
David Aldón
JL Fleiss
José A. R. Fonollosa
Marta R. Costa-jussà
MR Costa-jussà
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Recently, machine translation systems based on neural networks have reached state-of-the-art results for some pairs of languages (e.g., German–English). In this paper, we are investigating the performance of neural machine translation in Chinese–Spanish, which is a challenging language pair. Given that the meaning of a Chinese word can be related to its graphical representation, this work aims to enhance neural machine translation by using as input a combination of: words or characters and their corresponding bitmap fonts. The fact of performing the interpretation of every word or character as a bitmap font generates more informed vectorial representations. Best results are obtained when using words plus their bitmap fonts obtaining an improvement (over a competitive neural MT baseline system) of almost six BLEU, five METEOR points and ranked coherently better in the human evaluation.Peer ReviewedPostprint (published version

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Character-level Intra Attention Network for Natural Language Inference

Author: Costa-jussà Marta R.
Fonollosa José A. R.
Yang Han
Publication venue
Publication date: 01/01/2017
Field of study

Natural language inference (NLI) is a central problem in language understanding. End-to-end artificial neural networks have reached state-of-the-art performance in NLI field recently. In this paper, we propose Character-level Intra Attention Network (CIAN) for the NLI task. In our model, we use the character-level convolutional network to replace the standard word embedding layer, and we use the intra attention to capture the intra-sentence semantics. The proposed CIAN model provides improved results based on a newly published MNLI corpus.Comment: EMNLP Workshop RepEval 2017: The Second Workshop on Evaluating Vector Space Representations for NL

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

Refinement of Unsupervised Cross-Lingual Word Embeddings

Author: Biesialska Magdalena
Costa-jussà Marta R.
Publication venue: 'IOS Press'
Publication date: 01/01/2020
Field of study

Cross-lingual word embeddings aim to bridge the gap between high-resource and low-resource languages by allowing to learn multilingual word representations even without using any direct bilingual signal. The lion's share of the methods are projection-based approaches that map pre-trained embeddings into a shared latent space. These methods are mostly based on the orthogonal transformation, which assumes language vector spaces to be isomorphic. However, this criterion does not necessarily hold, especially for morphologically-rich languages. In this paper, we propose a self-supervised method to refine the alignment of unsupervised bilingual word embeddings. The proposed model moves vectors of words and their corresponding translations closer to each other as well as enforces length- and center-invariance, thus allowing to better align cross-lingual embeddings. The experimental results demonstrate the effectiveness of our approach, as in most cases it outperforms state-of-the-art methods in a bilingual lexicon induction task.Comment: Accepted at the 24th European Conference on Artificial Intelligence (ECAI 2020

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC